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ABSTRACT 

The number of main-sequence stars for which we can observe solar-like oscillations is 
expected to increase considerably with the short-cadence high-precision photometric 
observations from the NASA Kepler satellite. Because of this increase in number of 
stars, automated tools are needed to analyse these data in a reasonable amount of 
time. In the framework of the asteroFLAG consortium, we present an automated 
pipeline which extracts frequencies and other parameters of solar-like oscillations in 
main-sequence and subgiant stars. The pipeline uses only the timeseries data as input 
and does not require any other input information. Tests on 353 artificial stars reveal 
that we can obtain accurate frequencies and oscillation parameters for about three 
quarters of the stars. We conclude that our methods are well suited for the analysis 
of main-sequence stars, which show mainly p-mode oscillations. 
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1 INTRODUCTION 

Stars with sub-surface convection zones, like the Sun, dis- 
play acoustic oscillations. The stochastic excitation mecha- 
nism limits the amplitudes of the oscillations to intrinsically 
weak values. However, it gives rise to a rich spectrum of 
oscillations. The excited pressure (p) modes probe differ- 
ent interior volumes, with the radial and other low angular- 
degree modes probing as deeply as the core. This differential 
penetration of the oscillations allows the internal structure 
and dynamics to be inferred as a function of depth. Seismic 
studies of the Sun have indeed proven to be very powerful 
in inferring its internal structure. 

The fact that the solar-like oscillations have such small 
amplitudes has made observations of these oscillations in 
stars other than the Sun very challenging. Over the past 
few years asteroseismic observations of main-sequence stars 
up to evolved red-giant stars have been made using Doppler 
velocity measu rements from ground -based spectrographs, 
e.g. ELODIE dBaranne et all Il99rj ), C ORALIE, HARP S 
i|Queloz et al.ll200ll ). UCLES and UVES l|D'Odoricdl2"000l) , 
and photometric space-based instruments such as WIRE 
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Buzasill2000lb MO ST l|Matthews et alj l2000h and CoRoT 
Baglin et alj|2006h . This has led to detections of solar-like 



oscillations in more than ten main-sequence stars and of 
the order of one thousand red giant s. For a recent, but 
pre-C oRoT, review of these results see lBedding fc Kieldsenl 
(120081 ) . CoRoT results for main-seq u ence stars are pre- 



sented by e.g., Michel et al.1 (|2008[); Appourchaux et all 



<|200Sl); iGarci'a et all (120091 b while |Pe Ridder et all (|2009h 



and lHekker et alj 1120091 ) present first CoRoT results for red 
giants. 

The NASA Kepler satellite was launched successfully 
into an Earth trailing orbit on March 7, 2009. The satellite 
contains a Schmidt telescope with a 0.95-m aperture and 
a 105-deg 2 field of view, equipped with a highly sensitive 
photometer with a spectral bandpass from 400 to 850 nm. 
It is designed to continuously and simultaneously monitor 
100 000 stars brighter than 14 th magnitude. Kepler will be 
pointed towards the constellations Cygnus and Lyra during 
the entire mission, which has a nominal length of 3.5 years. 
For most stars, data will be integrated over 30 minutes, while 
for approximately 512 stars at a time, data with a 1-minute 
cadence will be obtained. Although the driving goal for the 
development of Kepler is to observe transiting Earth-like 
exo-planets, these observations are very well suited for aster- 
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oseismology, including low-amplitude solar-like oscillations. 
Even so, asteroseismology will contribute to the exo-planet 
investigations by determining radii of the planet-hosting 
stars, which are needed to extract the planetary radii. The 
radii of stars exhibiting solar-like oscillations can be ob- 
tained using the difference in frequency between modes with 
consecutive radial orders, i.e., the large separation (Au). 
This is a measure of the sound travel time acr oss the star, 
which depends on the density of the star. See IStello et al.l 
(2009) for an overview of current approaches to obtain radii. 
The astcroscis mic potential of Kepler is des cribed in more 
detail bv lChristensen-Dalsgaard et alj (|2008T ) . 

Apart from the determination of radii, the detection of 
solar-like oscillations in stars at different epochs along stellar 
evolutionary life cycles offers the prospect to test theories of 
stellar evolution and stellar dynamos for many stars. The 
input data for probing stellar interiors are the mode param- 
eters. Accurate mode parameters are a vital prerequisite for 
robust, accurate inference on the fundamental stellar param- 
eters. 

We expect in total of the order of a thousand solar- 
like stars to be observed by Kepler in short cadence. Short- 
cadence oscillation data are needed to observe solar-like 
oscillations in main-sequence stars and subgiants as these 
occur at frequencies of the order of a few-hundred up to 
several-thousand micro-Hertz. The short-cadence (1 minute) 
data have a Nyquist frequency of ~ 8300 /iHz, while the 
Nyquist frequency of the long-cadence (30 minute) data is 
~ 275 /jHz. 

In preparation for the Kepler mission, the asteroFLAG 
consortium has developed automated tools to analyse solar- 
like o scillations in main-s e quence stars (e.g. Huber et al.l 
120091 ; iMathur et all 120091 ; iMosser fc Appourchauxl 120091). 
and t e sted these tools ex tensively, see e.g., IChaplin et al.l 
(|2008t ): [Stello et al.l (|2009T ) . Automated tools are needed to 
cope with the large number of stars we expect to be observed 
with Kepler. Here, we present an automated pipeline, built 
to determine oscillation parameters of solar-like oscillations 
in main-sequence stars and subgiants, which we describe in 
Section 3 (some mathematical details are deferred to Ap- 
pendix A). We compare the results (Section 4) of the auto- 
mated analysis with the input parameters used for simula- 
tions of realistic artificial data of a few hundred stars, pre- 
pared as if they were observed with Kepler in short-cadence 
(Section 2). 



2 SIMULATED DATA 

The simulated time-series are based on stella r parameters 
availa ble in the Kepler Input Catalogue (KIC) (|Brown et al.l 
120051 ) for the main-sequence and subgiants commissioning 
and survey targets. For these simulations, stellar parameters 
are randomly chosen within the expected formal and system- 
atic errors around their KIC values and used as inputs to the 
model grid prepared for the Aarhus Kepler pipeline (Quirion 
et al. (2009), in preparation). The resulting parameters and 
model frequencies have then been used for the simulations. 
Rotation effects, granulation, activity and white noise cor- 
responding to the brightness of the target have been added. 
Also the lifetimes of the oscillation modes have been varied. 
All time series are 30 days long with a 1-minute cadence. 



The stellar models were generated with the A arhus 
stellar evolution code (|Christense n-Dalsgaard 2008 bj) us - 
ing the OPA L equation of state lllglesias fe Rogers! Il996l ) 
along with the 
the OPAL and 



Grevesse fc Noeisl dl993 ) sola r mixture using 
Alexander fc Ferguson! 1 19941 ) opacity tables. 



The frequencies of the p modes we re calculated using the 
adiaba tic pulsation code ADIPLS (|Christensen-Dalsgaardl 
2008a|). The timeseries are generated using a combination 
of the asteroFLAG and Aarhus simulators l|Chapl in et al.l 
l2008l ; IStello et al.ll2004 ). 



3 METHODOLOGY 

We developed an automated pipeline to obtain the follow- 
ing (oscillation) parameters of main-sequence and subgiant 
solar-like oscillators from Fourier spectra of the time-series 
observations: 

• frequency range of oscillations, 

• frequency at which maximum oscillation power occurs, 

• parameterisation of the background of the entire 
Fourier spectrum, 

• average large frequency separation between consecutive 
radial orders, 

• maximum mode amplitude and amplitude envelope of 
the oscillations, 

• linewidth (lifetime) at the frequency of maximum power 
of the oscillations, 

• individual frequencies. 

Our methods to determine these parameters are described 
below, with some mathematical details and error calcula- 
tions in Appendix A. 

We stress here that this pipeline only uses the time- 
series data as input and does not require any other informa- 
tion. 



3.1 Frequency range of the oscillations 

We are looking for high-order, low-degree, solar-like (p- 
mode) oscillations, the frequency {v n ,e) of which w e expect 
to fo llow approximately the asymptotic relation l|Tassoull 
1980). In the present study we use the following version of 
the relation: 



V n ,t 



= Av(n+ -£ + e) -£(£+l)D, 



(1) 



where n is the radial order and £ the angular degree. Av is 
the large separation, which is sensitive to the sound travel 
time across the star and e is a constant sensitive to the 
surface layers. D is related to the small separation (^^02) 
between adjacent modes £ = and £ = 2 by the expression 
5^02 ~ 6-D. Because we deal with photometric data it is 
unlikely that we can observe £ — 3 modes due to cancellation 
effects. 

We know from theoretical models and observations in 
the Sun and other stars that Av, 8vo2 and e depend slightly 
on frequency and angular degree. Because the changes in 
Av are usually relatively small, we consider Av constant 
to a first approximation, and search for a frequency range 
in which the power spectrum has peaks at near-equidistant 
intervals. 
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20 30 
frequency [aiHz] 

Figure 1. Power spectrum of the power spectrum in a window 
with equidistant frequency peaks. The power is normalised, such 
that the noise level is 1. The Au/2 at ~25 fiHz and Au/A at ~12 
/iHz features are clearly visible. 



From 200 piHz up to the Nyquist frequency, the power 
spectrum is divided into windows of variable width (w) de- 
pending on the location of the central frequency of the win- 
dow (^central), with ^central separated by w/4. The frequency 
^central is used as a proxy of i/ ma x and is therefore expected 
to scale with the acoustic cut-off frequency. Hence, w is de- 
fined EIS W — - (i^central/^max© ) 

• Wq, With f max = 3100 /LtHz, 

i.e., the central frequency of the oscillations in the Sun, and 
u)g = 2000 /iHz, the expected width of the frequency inter- 
val over which we may find oscillations in the Sun, were the 
Sun to be observed as a bright star with Kepler. 

To find equidistant frequency peaks, we compute the 
power spectrum of the power spectrum (PS®PS), which is 
equivalent to the autocorrelation of the time series, in each 
frequency window (see Fig. [T] for an example of a PSQPS). 
Subsequently, we check for the presence of features at pre- 
dicted values of Az//2, Au/A and Au/6. The predicted value 
of Ay is obtained f rom A v ~ u° e H Tal Csee lStello et all (120091 ) 
and lHekker et all ((2009)), and we allowed for a 30% devia- 
tion from the predicted value. When the probability of the 
presence of these three features being due to noise is less 
than 0.2%, we interpret this as oscillations being present 
in the considered window. All windows in which we find 
equidistant frequency peaks are selected as part of the fre- 
quency range of the oscillations. For details on the compu- 
tation of the probability see Appendix A. 

In subgiants, for which we expect oscillations in the 
frequency range 100 - 1000 ^iHz, the assumption of reg- 
ularly spaced frequencies may no longer be valid due to 
the presence of g-modes and mixed modes. The increased 
luminosity of these more extended stars results in higher 
mode amplitudes (^4). This is because A ~ (L/A'I) 3 , with 
L and M the stellar luminosity and mass, respectively. The 
value of exponent s is of the order of 1, but s t ill de bated 
in the literature, e.g., see iKieldsen fc Beddind (|l995l ) and 
ISamadi et al. I d2005h . As a result of the increased mode am- 
plitudes we expect oscillations with a good signal-to-noise 
ratio and therefore the presence of prominent peaks in the 
power spectrum. Therefore, in case we did not find equidis- 
tant frequency peaks, we fit a background signal including 
granulation, activity and white noise to the power spectrum 
and check whether there is a significant power excess with 



respect to this fit in the frequency range 100 - 1000 ^Hz 
(see Fig. [6] for examples of such fits) . If this is the case, the 
frequency range of this power excess is taken to be the oscil- 
lation frequency range. We note here that before applying 
this fitting to real data one has to check for possible ob- 
servational artefacts in the data, which can possibly have a 
similar signal in the power spectrum. 

In case we do not detect any interval with equidistantly 
spaced frequencies, or significant power excess due to oscil- 
lations, we search once more for oscillation frequencies sep- 
arated by Au, but now we do not assume Av to be constant 
with frequency. To account for this frequency dependency, 
we stretch (or compress) the frequency axis of the power 
spectrum slightly. This stretching is performed in such a 
way as to produce an equidistant pattern of peaks on the 
stretched, as opposed to the original, frequency axis. The 
PS&PS of the stretched power spectrum will therefore show 
a stronger (more prominent) signature of the large spacing 
than the PS®PS of the original spectrum. 

The stretching depends on the value of Au and on the 
frequency range of the oscillations. We assume a maximum 
of 10% change in Av over the oscillation frequency range. 
The maximum stretch (s max ) is therefore: 



= Q.lAv 



frequency range 
Au ' 



(2) 



Then we compute the stretched frequency (^stretch) as fol- 
lows: 



^stretch 



= (u - U c ) - j ■ S ma x • {^J- - 1^ 



(3) 



where u c denotes the central frequency of the considered fre- 
quency range, which will usually be the frequency at which 
maximum power occurs, i.e., f ma x. Furthermore, j is an in- 
teger which may have both positive and negative values, i.e. 
negative stretching means effectively compressing the power 
spectrum. To find the optimum stretch value we search for 
the value of j for which we find minimum probability of the 
features in the PS®PS to be due to noise. 



3.2 Background signal 

A background signal (bg) consisting of granulation, activity 
and white noise is fitted to a binned power spectrum where 
we computed the average power over independent bins. The 
frequency range of the oscillations is excluded. Granulation 
and activity are represented by power laws, from which we 
obtain the time scales (r gran and r ac t) and power (jJgran and 
Pact) of both phenomena respectively (see Eq. [4}. The gran- 
ulation exponent a is left as a free parameter, while the 
activity exponent is fixed to 2. Fixing the activity exponent 
is justified by the fact that we assume an exponential decay 
of the activity over time. For the granulation the e xponent 
is a f ree parameter, as in the original Harvey model dHarvevI 
1985) the granulation is modelled with three exponentially 
decaying power laws. For the present data, we can only fit 
one power law for the granulation due to the limited res- 
olution and the input in the simulations, and therefore we 
do not assume exponential decay, i.e., fix the exponent at 2. 
Note that the background in the simulated data on which 
we tested our pipeline was comprised of two power laws. 
Should we find that more than two power laws are needed 



© 2009 RAS, MNRAS 000,ITHT2l 



4 S. Hekker et al. 2009 



8000 



Ji 6000 




2000 



4000 



6000 



8000 



v (from M, R and T ) fuHzl 

max v efr L ' J 



In a few cases, the fitting with two power laws does not 
work properly. This is because only one decaying profile is 
visible in the data, either due to the presence of the oscil- 
lations at the same frequency as the hump of the second 
decaying profile, or due to a too low a signal-to-noise ratio, 
i.e., high white noise or low signal, or a combination of the 
two. In these instances we fit only for one power law and 
the offset b. This does not provide us with an optimal fit at 
low frequencies (below ~ 10 A*Hz), and the parameters can- 
not be used to infer properties of granulation and activity. 
However, at higher frequencies (> 100 /^Hz), where the oscil- 
lations reside the single power-law fit provides a reasonable 
estimate of the background. The standard deviations of the 
fitting parameters are used as errors. 
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Figure 2. c max results from method I (top) and method II (bot- 
tom) (see Section 3.4) vs. c mK computed using the input M, R 
and T c ff in Eg. 1121 The one-to-one relations are indicated with a 
dashed line. 



in real data, it will be straight forward to modify the code 
accordingly. In addition to the power laws, we add an off- 
set b which contains mostly white noise. However, in cases 
where no oscillations can be detected some oscillation signal 
might also be present in this offset. The final form of the 
background model used is: 

Pgran Pact 



bg = 



+ 



+ b. 



(4) 



(f + (r gra n • v)°) (1 + (r a ct ■ 

For the input parameters we chose for p a ct and p gra n the 
maximum power of the binned power spectrum and 0.001 
times this value, respectively. Furthermore, the inputs for 
r act and Tg r an were 100 000 and 1 000 seconds, while the input 
value for b was the mean power at high frequencies outside 
the oscillation range. As a first estimate, we chose a to be 
equal to 2. To obtain the optimal fit, we vary the input 
parameters slightly. We randomly select one of the fitting 
parameters and multiply this by: 



1 + 0.3 x rn, 



(•>) 



where, rn denotes a random number that has a normal dis- 
tribution with a mean of zero and standard deviation of one. 
To minimise boundary effects from the excluded oscillation 
frequency range we vary this range by extending or short- 
ening it on both sides by 1/6 of the length of the original 
frequency range. After repeating this 200 times for each ex- 
cluded oscillation range, the fit with the lowest \ 2 ls used 
as the best-fitting background fit. 



3.3 Average large separation 

For the estimation of the large separation (A^) we compute 
the PS&PS in the frequency range of the oscillations. Here, 
we take into account that Av depends on frequency and 
compute the PSgPS of a power spectrum with a stretched 
frequency axis. Determining Av from the stretched power 
spectrum provides a more reliable measure of Av and if re- 
quired an estimate of the gradient of the large spacing with 
n, SAv/8n can be made. For more details on the stretching 
see the last paragraph of Section 3.1. The derivation of the 
gradient of the large separation is presented in Appendix 
A. In the PS&PS (see Fig[TJ we determine the position of 
the Av/2 and features. The centroids and uncertain- 

ties of these features are computed in two ways. In the first 
method, we determine the power weighted centroids of the 
features in the PS®PS and their errors are computed as the 
standard deviation of grouped data (see Eq. lA6l in Appendix 
A). In a second method we compute the Bayesian posterior 
probability of the points in the PS® PS, using the same equa- 
tions as for the individual frequencie s, i.e., Eq s.[8l-[T0l which 
are discussed in Section 3 .6 (se e also lBroomhall et al.l (|2009l ) 
and lAppourchaux et al.l ([2009) . Using these probabilities we 
compute the posterior weighted centroid of the feature. The 
interval with a probability of the feature not being due to 
noise higher than 68.27%, i.e., la in a Gaussian distribution, 
is used as the uncertainty interval. Finally, for both meth- 
ods, we determine Av by computing a weighted average of 
Av/2 and Av/A. 

We also compute Av from an autocorrelation of the 
full oscillation frequency range and from an autocorrelation 
using the individual oscillation frequencies determined with 
the Bayesian approach only (again, see Section 3.6). A Gaus- 
sian is fitted to the feature at Av in the respective autocorre- 
lations and the width of the Gaussian is used as an estimate 
of the error. 



3.4 Maximum mode amplitude, amplitude 
envelope and frequency of maximum 
amplitude 

Our next package provides estimates of the maximum mode 
amplitude, and the mode-amplitude envelope as a function 
of frequency. Our results are scaled to be equivalent radial- 
mode amplitudes. 

In summary, for method I we began by subtracting 
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Figure 3. Top: Au results from the PS(g>PS vs Au computed using the input M and R in Eq. I13I In the left panel the Au values are 
computed as the power weighted mean centroids of the features in the PS(8)PS. In the right panel Au values are computed using the 
Bayesian posteriory probability to compute the centroids of the features in the PSC3PS. Bottom: Au results from autocorrelations vs Au 
computed using the input M and R in Eq. 1 13 1 In the left panel the autocorrelations of the full oscillation frequency are used to compute 
Au. In the right panel the individual frequencies obtained using a Bayesian approach are used to compute the autocorrelations, from 
which we determine Au. The one-to-one relations are indicated with a dashed line. 



the background fit from the power spectrum. The resulting, 
residual power spectrum is averaged over the range occu- 
pied by the modes using a boxcar filter of width 3Az/. Next, 
we multiply this averaged, residual spectrum by the large 
frequency spacing Au, and finally we divide by a constant 
factor, c, to allow for the effective number of modes in each 
slice Au of the spectrum. The value of c is chosen so that 
the above procedure gives observational estimates, as a func- 
tion of frequency, of the power envelope for radial modes, c is 
computed assuming the presence of 4 frequencies in each Au 
interval with I = 0, 1, 2, 3, with relative power per mode of 
TO, 1.5, 0.5, 0.03 respectively, c is the total power we expect 
in a Av interval, i.e., 3.03. There is a slight dependence of c 
on limb darkening and thus on T B s and log g, but the values 
change only by a few percent and we ignore those changes 
here. 

The highest value of the power envelope is an estimate 
of the maximum mode power. The mode amplitude envelope 
and the maximum mode amplitude are given by the square 
root of the power envelope, and square root of the maximum 
power, respectively. The frequency at which the maximum 
mode power occurs is u max - 

We choose to average the spectrum using a boxcar filter 
as opposed to the Gauss ian filter (of width 4Az/) adopted 
by iKieldsen et aT] |2008). This is because it allows us to 
estimate very straightforwardly uncertainties for the ampli- 
tudes, here in independent frequency ranges of width 3Au. 



For example, we estimate the uncertainty of the maximum 
mode power as the standard deviation of the powers in each 
bin in the frequency range 3A^ that contribute to the esti- 
mated maximum power. We may then calculate independent 
averages in ranges on either side of the maximum, to give an 
estimated power envelope with uncertainties. Uncertainties 
for the amplitudes follow by remembering that fractional er- 
rors on the mode amplitudes are equal to half those on the 
mode powers. 

Our decision to average over 3Ai/ was to some extent 
determined by the following obvious compromise. The nar- 
rower the range, the more we avoid smoothing out poten- 
tially interesting features of the amplitude envelope, while 
the wider the range, the less subject we are to fluctuations 
due to the stochastic nature of the modes. But there is also 
another important factor, which argues for adopting a wider 
range: that is to suppress biases in the estimated maximum 
mode amplitudes when the signal-to-noise ratio is quite low. 

The frequency at which maximum oscillation power oc- 
curs (i/max) is computed as the weighted mean frequency 
of the oscillation power with the error computed from the 
standard deviation of grouped data (see Eq. IA6|I . 

In a second method (hereafter method II), we fit a Gaus- 
sian to the binned oscillation power, where the binning is 
performed over intervals of 2Au. The height of the Gaus- 
sian fit is then converted to amplitude per radial mode by 
multiplying by Au/c, as per the other approach. We use 
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the standard deviation of the fit parameters to compute the 
errors. The centre of the Gaussian fit is i/ max . 



3.5 Linewidth of most prominent modes 

We seek a straightforward and robust method for determin- 
ing from the power spectrum the linewidth shown by the 
most prominent modes. 

Our method relies on the fact that the height in the 
power spectrum of a solar- like (i.e., damped) mode peak de- 
pends not only on the total power of the mode, but also (cru- 
cially to our method here) on the linewidth (or equivalently 
the damping time) of the mode and the intrinsic resolution in 
frequency of the spectrum. The height, H, in units of power 
per Hertz is well described in both the resolved and unre - 
solved regime by l|Fletcher et al.ll2006l ; Ichaplin et al.ll2009l) : 



H(T) 



2A 2 T 
ttTA + 2 ' 



(6) 



where A 2 is the total power of the mode, A is the FWHM 
linewidth of the mode peak, and T is the effective length of 
the observations. We may re-express Equation [5] in terms of 
the intrinsic (or natural) resolution in frequency 8 — 1/T. 
Substitution and subsequent re-arrangment of the equation 
then gives the following 



2A 2 28 



(7) 



which is the form required to explain our method. For a 
range of values of 8 we estimate the ratio A 2 /H(8) of the 
most prominent radial mode in the spectrum (as explained 
in the next paragraph below). A plot of A 2 /H{8) versus 
8 then yields data following a linear relationship. We fit a 
straight line to the data, and the intercept on the ordinate 
in principle provides an estimate of the linewidth, A. Evalu- 
ation of the spectrum at different S is achieved by averaging 
the spectrum of the full timeseries over different numbers 
of bins M (thereby degrading the intrinsic resolution as re- 
quired) . If T is taken to be the effective length of the time- 
series, this means that 8 = M/T. 

We estimate the ratio A 2 /H(S) as follows. We already 
have an estimate of A 2 courtesy of the mode amplitude pack- 
age in Section 3.4. To estimate the heights H(S) in each 
A/-bin-averaged spectrum we simply take the highest power 
spectral density in the range Av/2 about v max . These esti- 
mates are only a proxy of the true, underlying H(8), which 
means that to correctly estimate linewidths A we must ap- 
ply an empirical correction to the results. We found from 
simulations that a linear correction with both an offset and 
slope of 0.4 applied to the raw estimates of the linewidth is 
sufficient for this purpose. 

Our simple proxy of H(S) may sometimes have been es- 
timated from the most prominent I = 1 mode (depending on 
the inclination of the star) , when it is the height of the most 
prominent £ = mode that we require. In our first version of 
the pipeline, we accept this potential uncertainty, and note 
that its main effect will be to add some additional scatter 
to the results. Any bias is taken care of by the empirical 
correction above. 
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Figure 4. Resul ts for the av erage mode amplitude from method I 
(top). IrCieldsen et al.l ll2008f ) (centre) and method II (bottom) vs. 
the input mode amplitude. The one-to-one relations are indicated 
with a dashed line. 



3.6 Individual frequencies 

For the determination of individu al frequencies we used a 
Bayesian approach adapted from iBroomhall et al] (|2009l ) 
and references therein. We want to test whether the power 
at each frequency in the power spectrum could be the re- 
sult of a component of a stochastically excited mode (Hi 
hypothesis) or due to noise (Hp hypothes i s). 

As explained by lAppourchaux et al] (|2009l i we aim to 
compute the posterior probability of Hp (p(Hp\x)) given the 
observed data x, i.e., 
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p(H \x) = 1 



(8) 



where we assumed that both hypotheses are, a priori, equally 
probable. For the \ 2 2 dof statistics of the power spectrum, 
the probability of observing x given that there is only noise, 
i.e., the probability of observing x if the Ho hypothesis is 
true, (p(x\H )) is: 



p(x\H ) = e 



(9) 



where x is the observed power divided by the background. 

For the alternative hypothesis Hi, i.e., the probability 
of observing x given that there is signal, we assume that 
we do not know a priori the mode height H . Therefore, we 
assume that the height can be taken from a uniform distri- 
bution between and H 3 (see Appendix A for details on the 
determination of H s ). We can then compute the probabil- 
ity of o bserving x if the Hi hypo thesis is true (p(x\Hi)) as 
follows |Appourchaux et ahll2009T ): 



p(si#i) = 4- 



-x/(l+h) 



dh. 



(10) 



Frequencies with a posterior probability less than 0.5% are 
taken to be candidate oscillation frequencies. The 0.5% 
posterior probability was chosen based on 1000 Monte 
Carlo simulations p erformed with the asteroFLAG code 
l| Chaplin et al.l 120081 ) of solar analogs, for which we com- 
puted the fraction of real detections for different posterior 
probabilities (see Fig. lAl[) . The final frequencies (z/fi na i) were 
computed using the parameter estimation: 



^filial 



J vp(Ho\x)dv 
J p(Ho\x)du 



(11) 



For the integration range, we use the frequency range for 
which we found that the posterior probability to find signal 
was larger than 68.27%, i.e, la in a Gaussian distribution. 
We also used this interval as the estimated error. We tested 
this error estimation by performing 1000 Monte Carlo sim- 
ulations of one single stochastically excited mode with flat 
background noise. For each simulation we computed the fi- 
nal frequency and its error. Then we expressed the offset be- 
tween the computed frequency and input frequency in terms 
of its error. We see that for 77% of the tests the offset is 
within 3<r and for 89% of the tests it is within 5a. 

In cases where more than three significant oscillation 
frequencies could be detected, we used these individual fre- 
quencies to compute the large separation from the autocor- 
relation of the frequencies. 



3.6.1 Small separations 

We have investigated for how many stars we might be able 
to find the small separation (81/02), he., for how many stars 
we could see more than two ridges in the echelle diagram. 
This was the case for less than 10% of the stars. This 
low number is most likely caused in part by the fact that 
we set the threshold posterior probability at only 0.5%, 
which reduces the false alarm rate, but also the number of 
identified frequencies. Therefore, for most main-sequence 
stars only two ridges are present in the echelle diagram. 
Because of the low percentage of stars for which we 




7000 



4000 



Figure 5. Line width as a function of temperature, with results 
for stars brighter than 9 th mag indicated with black dots. Results 
for fainter stars are indicated with g rey diamonds. The A ~ Tf ff 
input relation jChapl in et al.ll2009h is indicated with a dashed 
line. 



might be able to identify the small separation with the 
strict thresholds currently applied, we do not include such 
computation in the automated pipeline presented here. In 
a further analysis using peak-bagging techniques, 81/02 will 
be obtained. These results will be presented by Fletcher et 
al. (2009), in preparation. 



4 RESULTS 

We are able to detect oscillations in 260 out of the 353 arti- 
ficial stars, i.e, nearly 75%. For these stars we estimated the 
oscillation parameters and background as described in the 
previous section, which we then compared with the input 
values used to create the artificial data. 



4.1 Oscillation parameters 

In Figs.[2]and[3]we compare our results for f max and Av with 
the values computed from the input mass (M), radius (R) 
and effective temperature (T e g), using the scaling relations 
l|Kieldsen fc Beddindll995h : 



M/M e 



(R/R@) 2 y/T e{t /5777K 



3050^Hz, 



M/M e 



(R/Rq 



r 134.9^Hz. 



(12) 



(13) 



The values obtained for both i/ nsx (Fig. [2} and Av 
(Fig. [3} are in good agreement with the input values, for 
each of the implemented methods, although a slight over- 
estimation is present in the determined i/ m „ values from 
method I at highe r frequencies where t he height of the oscil- 
lations decreases (jChaplin et al.l [20091 ). For method I, 54% 
of our i/ m „ values agree within uncertainties with the input 
values, while this percentage increases to 97% within 3 times 
the computed uncertainties. For method II, we find that 37% 
of our !/ m ax values agree with the input values within their 
uncertainties, which increases to 91% agreement within 3 
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Figure 6. Power spectra of 4 artificial Kepler stars with oscillations at different frequencies, displayed in a log-log scale. The red lines 
indicate the binned power spectrum and the green lines the background fits, with the individual components shown in cyan dashed lines. 
The identified oscillation frequency ranges are indicated with vertical blue lines. No oscillations could be identified in the top left power 
spectrum, while for the bottom right power spectrum we could only fit one power law for the background. 



times the uncertainties. In 96% of the stars for which we 
could detect oscillations we found z/ max with both methods. 

For Av we found that 93%, 90%, 86% and 97% of our 
values agree with the input values within 5%, for results 
computed with the weighted mean of the features in the 
PS(g>PS, the Bayesian probabilities in the PS(g>PS, autocor- 
relation of full oscillation frequency range and autocorrela- 
tion of individual frequencies, respectively. The uncertain- 
ties for Av computed with the Bayesian probabilities in 
the PS(g)PS are larger and seem more realistic than for all 
other methods. For this method 94% of our values agree 
with the input within 3 times the uncertainties, while this is 
only 44%, 20% and 7% for the other methods, i.e., weighted 
mean of the features in the PS®PS, autocorrelation of full 
oscillation frequency range and autocorrelation of individ- 
ual frequencies, respectively. For the first three methods we 
find Av in all stars for which we detect oscillations. For Av 
computed with the individual frequencies we have results 
for 60 % of the stars. Due to the better error estimate the 
results of the Bayesian probabilities in the PS®PS are most 
reliable. Despite their underestimated errors, the Av values 
computed with the other methods are in more than 90 % of 
the cases compatible (within 3 times the uncertainties) with 
the Bayesian values. 

In Fig. [4] our results for the maximum amplitude per 
radial oscillation mode are shown as a function of the input 
maximum amplitude per radial mode. For comparison we 
also computed estimates of the maximum amplitude using 



the method of iKieldsen etUI (120081 ). The results are con- 
sistent with the one-to-one relation, and for each method 
~ 90% of our amplitude values are consistent with the in- 
put values within 3 times the computed uncertainties. 

Also, we computed the width of the frequency peaks in 
the power spectrum. The results are shown in Fig. [5] The 
input values of the line width follow the relation A ~ T* s , 
and we see in Fig. [S] that for stars brighter than 9 th mag- 
nitude (black dots) our results for A are qualititatively in 
agreement with this relation. Good signal-to-noise ratio is 
required for this method and for fainter stars the scatter in 
the results becomes considerably larger. 

Four examples of background fits to power spectra with 
oscillations at different frequencies are shown in Fig. |6j all of 
which have a \ 2 of the order of 1. We also compare in Fig. [7] 
our fitted values of the granulation parameters p gra n and 
Tgran with the artificial input values. Here, we see that the 
results follow the one-to-one relation with the input value, 
but with non-negligible scatter. 

Note that for stars where a detection was made, we were 
not always able to determine the oscillation parameters with 
all methods described in Section 3. For each parameter we 
have at least one method that produces a result for all stars 
with detected oscillations, but for some methods we have re- 
sults for fewer stars, down to 60% of the stars with detected 
oscillations. The quoted percentages are always computed 
for stars with a result for the considered method. 
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Figure 7. Results of the timescale (r gra n, top) and power (p gra n, 
bottom) of the granulation as a function of the input values. The 
one-to-one relations are indicated with a dashed line. 
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Figure 8. Distribution of apparent magnitudes of stars for which 
we detected oscillations (black solid line) and for stars for which 
we did not detect oscillations (red dashed line). 

4.2 (Non-)detections of oscillations 

Next, we investigate empirically which parameters are of im- 
portance for the detection of oscillations in the data. First, 
we consider the apparent magnitude distribution of the stars 
with and without detected oscillations, see Fig. [8] As ex- 
pected the percentage of stars for which we can detect solar- 
like oscillations decreases for fainter stars. 

We fitted the background for all stars, independent of 



whether we did or did not detect any oscillations. From a 
comparison of the distribution of the fitted parameters, we 
find that for stars in which we could not detect any oscilla- 
tions the offset (6 in Eq.[4]) is on average larger than for stars 
in which we could detect oscillations, while the exponent a 
is typically lower. These distributions are shown in Fig. [9] 
These results are not unexpected since for stars for which we 
could not detect oscillations, the offset contains both noise 
and signal, while for stars with detected oscillations the off- 
set mainly consists of noise. The latter can be seen in the 
bottom panel of Fig. O where we plot the offset as a func- 
tion of the input i/ max . We indeed see that for stars with 
input zAnax > 3000 /iHz for which we did not detect oscilla- 
tions the offset is higher than for stars for which we could 
detect oscillations in this frequency range. The exponent a 
influences the slope of the granulation in a log-log plot of a 
power spectrum. An increase in the offset will decrease the 
slope and therefore the exponent a. From these distributions 
it might be possible to obtain upper limits on some oscilla- 
tion parameters. We consider such an investigation beyond 
the scope of this paper. 



5 DISCUSSION AND CONCLUSIONS 

With the methods described in Section 3, we could detect 
solar-like oscillations and their parameters in 260 out of 353 
artificial main-sequence stars and subgiants and individual 
frequencies in 154 stars, not further discussed here. In gen- 
eral, we have tried to be very cautious to reduce the number 
of false detections. Special care is taken in the identifica- 
tion of the oscillation frequency interval as an incorrectly 
identified frequency range will imply miss-identifications for 
all oscillation parameters and the background fitting. Fur- 
thermore, parameters such as z/ ma x, Av and amplitude per 
radial mode are determined with two or more (independent) 
methods. 

The input values for i/ max and Av are reproduced by our 
analyses for the majority of stars. Also, for the amplitudes 
per radial mode, our values are in agreement with the input 
values. 

The widths and thus the life times of the modes can 
be determined with reasonable accuracy for Kepler stars 
brighter than 9 th magnitude. Our method does not contain 
detailed fitting, nor does it take into account that lifetimes 
vary for modes of different degrees. Nevertheless, for the 
bright stars we do find values consis tent with the input re- 
lation A ~ T c 4 ff dChaplin et alj|2009h . 

An accurate determination of the value of the back- 
ground at the oscillation frequencies is important for two 
reasons. First, the background level is taken into account 
in the extraction of oscillation parameters such as the am- 
plitude per radial mode, and, secondly, it provides informa- 
tion on the power and time scales of atmospheric parame- 
ters. From the fact that we can obtain oscillation parame- 
ters which are consistent with the input parameters, we can 
on the one hand infer that our background estimate in the 
oscillation frequency range is accurate enough to determine 
oscillation parameters. On the other hand we still see scatter 
in the determined time scale and power of the granulation 
around the input values. The scatter in the granulation pa- 
rameters does not mean per se that the background level 



© 2009 RAS, MNRAS 000,rflfl2l 



10 S. Hekker et al. 2009 



50 
40 
30 
20 

10 




n 







2 3 
exponent o 



25 
20 
15 
10 
5 



12 3 4 

offset [ppm 2 yu.Hz"'] 




* * * * * 



2000 



4000 



6000 



8000 



v (from M, R and T ) fuHzl 

max v efr L ' J 



Figure 9. Distribution of the exponent a (top), the offset (mid- 
dle) (see Eq. |3J , and offset as a function of input v max for stars 
with detected oscillations (black solid line, black asterisks) and 
stars without detected oscillations (red dashed line, red dots) 



in the frequency interval is uncertain, as we fit a function 
with 6 parameters, but the parameters should be treated 
with caution when using them for further investigations of 
activity and granulation. 

In terms of our sensitivity to detect oscillations, we 
found clear evidence that the percentage of stars for which 
we can detect oscillations decreases for fainter stars. Also, 
we found evidence that it is harder to detect solar-like oscil- 
lations in cooler (T c ff < 5500 K) main-sequence stars than 
in the hotter (T e g > 5500 K) ones. This is because in the 
simulations (as in real data) the pulsation amplitudes scale 
with luminosity, with a weaker dependence on temperature. 



In conclusion, the analysis tools compiled into a pipeline 
presented here proved to identify oscillations for a large frac- 
tion of artificial main-sequence stars. For the majority of 
these stars we determine oscillation parameters within 3<r of 
the input values. The existence of such pipelines will be im- 
portant to be able to perform an asteroseismic analyses on 
the many stars we expect to become available from Kepler. 
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APPENDIX A: METHODOLOGY EXTENSION 

Here we provide additional (mathematical) information 
on the methodology described in Section 3, including 
determination of errors. 

Probability of peaks in PS(g)PS The probability 
of the presence of equidistant frequency peaks in the 
PS®PS being due to noise is computed as follows. We 
compute the probability (pi) that a random variable from 
a 6 degrees of freedom (dof) ^-distribution is larger than 
the average height of the three peaks at Au/2, and 
Au/Q in the PS®PS. Each of these peaks has 2 dof, hence 
the 6 dof. 

Next, we compute the probability of this occuring by 
chance at least once over the full N bins of the PS®PS. 
We must also take account of the fact that in practice we 
oversample the PS®PS by a factor of 10, so all bins are not 
independent. The resulting probability is given by: 

where /3 = 3 has been shown to provide a robust em- 
pirical correction for the effect of the oversampling (e.g., 
IChaplin et all |2002| ; iGabriel et al.ll2002l ). Finally, the prob- 
ability that the peaks are not due to noise is just 1 — P. 

A similar procedure is used to compute the probability 
of only one peak in the PS®PS, such as we use in the 
computation of the large separation. In these cases pi is 
computed as the probability that a random variable from 
a x 2 2-dof distribution is larger than the height of the 
considered peak in the PS®PS. 

Gradient of the large spacing Although results 
are not discussed in the paper, we can estimate SAu/Sn 
from the stretching as follows. First, we have 

8Av _ 8Av 8i> _ 8Av 
8n 8v 8n 8v 

with Av§ the average large spacing over the frequency or 
n range of interest. Now, we may estimate 8Avj8v by dif- 
ferentiating Eq. [3] To differentiate the second term on the 



■ Av , 



(Al) 
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Figure Al. Fraction of real frequency detections as a function 
of posterior probability. 



right hand side in Eq. [3] we use the substitution x = ^- -1, 
with I 2 = which gives: 



c5 

5^ 



5v 



{cx ) 



2cx 



2c ( v 

V, 



So, after including the differential of the first term on the 
right-hand side, we have 



^stretch _j 

8p v. 



from which we obtain Av as a function of frequency: 
Av = Au H ' 



(A3) 



(A4) 



Finally, we find that the change in Af as a function of n is: 
SAu 



SAv . 2cAu , 2cAv% 
-= — = —z — ■ Au = =— • Au = 2 — • 

on ou Vc vt 



(A5) 



The best-fitting c — j ■ s max 
estimate of 8Av/8n. 



therefore provides a direct 



Standard deviation of grouped data The stan- 
dard deviation of grouped data is used to compute errors 
in Av computed from the power weighted centroids in the 
PS<g)PS, where we interpret each feature in the PS®PS as 
compiled of a number of bins with a certain height (/) and 
midpoint (a;). 



E/ - 1 



(A6) 



The same formula is used to compute the error on v max 
from the frequency-binned oscillation power as computed 
in method I for the amplitudes per radial mode. Here, 
the total power and central frequency of each bin arc 
interpreted as / and x, respectively. 



Bayesian signal hypothesis Hi For the computa- 
tion of the probability of observing x if the Hi hypothesis 
is true, i.e, p(x\Hi) (Eq. I10[) , we need to integrate over 
an interval between and H s . To determine H a we have 
smoothed the power spectrum over S microHertz. H s then 
equals the maximum height of the smoothed spectrum 



minus the mean background noise level. To determine 
the optimum S we performed Monte Carlo simulations 
and determined the false detection rates. Spectra were 
generated using the asteroFLAG code l|Chaplin et al.ll200ct ) 
for 1000 different stars with random inclinations. We then 
used different values of S to determine H s and from the 
obtained candidate frequencies we also determine the ratio 
of the number of candidates that are actually modes to the 
total number of mode candidates. The higher this ratio the 
lower the proportion of false detections. We also determined 
the total number of modes that could be detected in each 
case to ensure that the method was producing a reasonable 
number of mode candidates. Note that a mode was counted 
as a detection if it lay within 2 linewidths of the input 
frequency and if the posterior probability was less than 
0.005. We repeated the simulations for modes with different 
widths including 0.3, 1.0, 1.7, 2.4, 3.1 and 3.8-times solar 
widths for oscillations in the range 2000 - 4500 /iHz. The 
optimum value of 5* was found to be ~ 10 fiRz, even for 
the smallest line width. Similar simulations were performed 
for stars whose oscillations lay in the range 200 - 450 /iHz. 
It was found that for frequencies in this oscillation range 
it was more appropriate to smooth over a narrower S to 
determine H s . If the oscillations frequencies are < 450 ^Hz 
we determined H s by smoothing over S = 1 fiRz. 
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